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Abstract 

We present a variational Bayesian method of joint image reconstruction and 
point spread function (PSF) estimation when the PSF of the imaging device 
is only partially known. To solve this semi-blind deconvolution problem, prior 
distributions are specified for the PSF and the 3D image. Joint image recon- 
struction and PSF estimation is then performed within a Bayesian framework, 
using a variational algorithm to estimate the posterior distribution. The image 
prior distribution imposes an explicit atomic measure that corresponds to im- 
age sparsity. Importantly, the proposed Bayesian deconvolution algorithm does 
not require hand tuning. Simulation results clearly demonstrate that the semi- 
blind deconvolution algorithm compares favorably with previous Markov chain 
Monte Carlo (MCMC) version of myopic sparse reconstruction. It significantly 
outperforms mismatched non-blind algorithms that rely on the assumption of 
the perfect knowledge of the PSF. The algorithm is illustrated on real data from 
magnetic resonance force microscopy (MRFM). 
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1. Introduction 

The standard and popular image deconvolution techniques generally assume 
that the space- invariant instrument response, i.e., the point spread function 
(PSF), is perfectly known. However, in many practical situations, the true 
PSF is either unknown or, at best, partially known. For example, in an opti- 
cal system a perfectly known PSF does not exist because of light diffraction, 
apparatus/lense aberration, out-of-focus, or image motion [HE]- Such imper- 
fections are common in general imaging systems including MRFM, where there 
exist additional model PSF errors in the sensitive magnetic resonance condition 
In such circumstances, the PSF required in the reconstruction process is 
mismatched with the true PSF. The quality of standard image reconstruction 
techniques may suffer from this disparity. To deal with this mismatch, decon- 
volution methods have been proposed to estimate the unknown image and the 
PSF jointly. When prior knowledge of the PSF is available, these methods are 
usually referred to as semi-blind deconvolution [U [5] or myopic deconvolution 

0CDIE]. 

In this paper, we formulate the semi-blind deconvolution task as an estima- 
tion problem in a Bayesian setting. Bayesian estimation has the great advantage 
of offering a flexible framework to solve complex model-based problems. Prior 
information available on the parameters to be estimated can be efficiently in- 
cluded within the model, leading to an implicit regularization of our ill-posed 
problem. In addition, the Bayes framework produces posterior estimates of un- 
certainty, via posterior variance and posterior confidence intervals. Extending 
our previous work, we propose a variational estimator for the parameters as 
contrasted to the Monte Carlo approach in [9]. This extension is non-trivial. 
Our variational Bayes algorithm iterates on a hidden variable domain associ- 
ated with the mixture coefficients. This algorithm is faster, more scalable for 
equivalent image reconstruction qualities in [9]. 

Like in [9] , the PSF uncertainty is modeled as the deviation of the a priori 
known PSF from the true PSF. Applying an eigendecomposition to the PSF co- 
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variance, the deviation is represented as a linear combination of orthogonal PSF 
bases with unknown coefficients that need to be estimated. Furthermore, we 
assume the desired image is sparse, corresponding to the natural sparsity of the 
molecular image. The image prior is a weighted sum of a sparsity inducing part 
and a continuous distribution; a positive truncated Laplacian and atom at zero 
(LAZE) prioi^HK Similar priors have been applied to estimating mixtures of 
densities [TUHHEI] and sparse, nonncgative hyperspectral unmixing [T3]. Here 
we introduce a hidden label variable for the contribution of the discrete mass 
(empty pixel) and a continuous density function (non-empty pixel). Similar to 
our 'hybrid' mixture model, inhomogeneous gamma-Gaussian mixture models 
have been proposed in |15j . 

Bayesian inference of parameters from the posterior distribution generally 
requires challenging computations, such as functional optimization and numer- 
ical integration. One widely advocated strategy relies on approximations to 
the minimum mean square error (MMSE) or maximum a posteriori (MAP) es- 
timators using samples drawn from the posterior distribution. Generation of 
these samples can be accomplished using Markov chain Monte Carlo methods 
(MCMC) [IB] . MCMC has been successfully adopted in numerous imaging prob- 
lems such as image segmentation, denoising, and deblurring [171 I16| . Recently, 
to solve blind deconvolution, two promising semi-blind MCMC methods have 
been suggested [5] [IS] . However, these sampling methods have the disadvantage 
that convergence may be slow. 

An alternative to Monte Carlo integration is a variational approximation to 
the posterior distribution, and this approach is adopted in this paper. These 
approximations have been extensively exploited to conduct inference in graph- 
ical models [19] . If properly designed, they can produce an analytical poste- 
rior distribution from which Bayesian estimators can be efficiently computed. 
Compared to MCMC, variational methods are of lower computational complex- 

A Laplace distribution as a prior distribution acts as a sparse regularization using £i 
norm. This can be seen by taking negative logarithm on the distribution. 
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ity, since they avoid stochastic simulation. However, variational Bayes (VB) 
approaches have intrinsic limits; the convergence to the true distribution is 
not guaranteed, even though the posterior distribution will be asymptotically 
normal with mean equal to the maximum likelihood estimator under suitable 
conditions |20j . In addition, variational Bayes approximations can be easily im- 
plemented for only a limited number of statistical models. For example, this 
method is difficult to apply when latent variables have distributions that do not 
belong to the exponential family (e.g. a discrete distribution [5]). For mixture 
distributions, variational estimators in Gaussian mixtures and in exponential 
family converge locally to maximum likelihood estimator [5TJ [22] . The theo- 
retical convergence properties for sparse mixture models, such as our proposed 
model, are as yet unknown. This has not hindered the application of VB to 
sparse models to problems in our sparse image mixture model. Another possi- 
ble intrinsic limit of the variational Bayes approach, particularly in (semi)-blind 
deconvolution, is that the posterior covariance structure cannot be effectively 
estimated nor recovered, unless the true joint distributions have independent 
individual distributions. This is primarily because VB algorithms are based on 
minimizing the KL-divergence between the true distribution and the VB ap- 
proximating distribution, which is assumed to be factorized with respect to the 
individual parameters. 

However, despite these limits, VB approaches have been widely applied with 
success to many different engineering problems [231 HH IH3 [35]. A principal 
contribution of this paper is the development and implementation of a VB al- 
gorithm for mixture distributions in a hierarchical Bayesian model. Similarly, 
the framework permits a Gaussian prior [27] or a Student's-t prior 28J for the 
PSF. We present comparisons of our variational solution to other blind decon- 
volution methods. These include the total variation (TV) prior for the PSF [29] 
and natural sharp edge priors for images with PSF regularization [30] . We also 
compare to basis kernels [2E] , the mixture model algorithm of Fergus et al. [3T] , 
and the related method of Shan et al. [32) under a motion blur model. 

To implement variational Bayesian inference, prior distributions and the 
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instrument-dependent likelihood function are specified. Then the posterior dis- 
tributions are estimated by minimizing the Kullback-Leibler (KL) distance be- 
tween the model and the empirical distribution. Simulations conducted on syn- 
thetic images show that the resulting myopic deconvolution algorithm outper- 
forms previous mismatched non-blind algorithms and competes with the previ- 
ous MCMC-based semi-blind method [9] with lower computational complexity. 

We illustrate the proposed method on real data from magnetic resonance 
force microscopy (MRFM) experiments. MRFM is an emerging molecular imag- 
ing modality that has the potential for achieving 3D atomic scale resolution 
[551 1531 155] . Recently, MRFM has successfully demonstrated imaging [55] [57J of 
a tobacco mosaic virus [55]. The 3D image reconstruction problem for MRFM 
experiments was investigated with Wiener filters [5§1 001 [57], iterative least 
square reconstruction approaches 0IJ [38j 02] , and recently the Bayesian esti- 
mation framework [TO] H5J [5J H] ■ The drawback of these approaches is that they 
require prior knowledge on the PSF. However, in many practical situations of 
MRFM imaging, the exact PSF, i.e., the response of the MRFM tip, is only 
partially known [3 . The proposed semi-blind reconstruction method accounts 
for this partial knowledge. 

The rest of this paper is organized as follows. Section [2] formulates the 
imaging deconvolution problem in a hierarchical Bayesian framework. Section 
[3] covers the variational methodology and our proposed solutions. Section [4] 
reports simulation results and an application to the real MRFM data. Section 
[5] discusses our findings and concludes. 

2. Formulation 

2.1. Image Model 

As in [9] 03] , the image model is defined as: 

y = Hx + n = T(K,x) + n, (1) 
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where y is a P x 1 vectorized measurement, x = [x±, . . . , xn] t b is an N X 1 
vectorized sparse image to be recovered, T (n, •) is a convolution operator with 
the PSF k, H = [h!,...,hjv] is an equivalent system matrix, and n is the 
measurement noise vector. In this work, the noise vector n is assumed to be 
Gaussiarj^J n ~ Af (0, c 2 Ip) . The PSF k is assumed to be unknown but a 
nominal PSF estimate kq is available. The semi-blind deconvolution problem 
addressed in this paper consists of the joint estimation of x and k, from the 
noisy measurements y and nominal PSF kq. 

2.2. PSF Basis Expansion 

The nominal PSF Kq is assumed to be generated with known parameters 
(gathered in the vector £ ) tuned during imaging experiments. However, due to 
model mismatch and experimental errors, the true PSF k may deviate from the 
nominal PSF kq. If the generation model for PSFs is complex, direct estimation 
of a parameter deviation, A£ = Ctrue ~ Co, is difficult. 

We model the PSF n (resp. {H}) as a perturbation about a nominal PSF 
Kq (resp. {H }) with K basis vectors Kfe, k — 1, . . . ,K, that span a subspace 
representing possible perturbations Ab. We empirically determined this basis 
using the following PSF variational eigendecomposition approach. A number 
of PSFs k are generated following the PSF generation model with parameters 
£ randomly drawn according to Gaussian distribution^] centered at the nom- 
inal values Co- Then a standard principal component analysis (PCA) of the 
residuals {kj — Ko} J= i is used to identify K principal axes that are associ- 
ated with the basis vectors k^. The necessary number of basis vectors, K, is 
determined empirically by detecting a knee at the scree plot. The first few 
eigenfunctions, corresponding to the first few largest eigenvalues, explain major 
portion of the observed perturbations. If there is no PSF generation model, then 
we can decompose the support region of the true (suspected) PSF to produce 

2 Af(n, X) denotes a Gaussian random variable with mean /x and covariance matrix X. 
3 The variances of the Gaussian distributions are carefully tuned so that their standard 
deviations produce a minimal volume ellipsoid that contains the set of valid PSFs. 
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an orthonormal basis. The necessary number of the bases is again chosen to 
explain most support areas that have major portion/energy of the desired PSF. 
This approach is presented in our experiment with Gaussian PSFs. 

We use a basis expansion to present t(c) as the following linear approxima- 
tion to K, 

K 

k(c) = k + ^c 4 k 4 , (2) 

i=l 

where the {c{\ determine the PSF relative to this bases. With this parameter- 
ization, the objective of semi-blind deconvolution is to estimate the unknown 
image, x, and the linear expansion coefficients c = [ci, . . . , ck } T ■ 

2.3. Determination of Priors 

The priors on the PSF, the image, and the noise are constructed as latent 
variables in a hierarchical Bayesian model. 

2.3.1. Likelihood function 

Under the hypothesis that the noise in is white Gaussian, the likelihood 
function takes the form 

p 

^1 w 2 ) = (^) ° * 

where ||-|| denotes the £2 norm ||x|| 2 = x T x. 

2.3.2. Image and label priors 

To induce sparsity and positivity of the image, we use an image prior con- 
sisting of "a mixture of a point mass at zero and a single-sided exponential 
distribution" [10, 4"3l 19]. This prior is a convex combination of an atom at zero 
and an exponential distribution: 

p{xi\a, tu) = (1 - w)S(xi) + wg(xi\a). (4) 
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In Q, S(-) is the Dirac delta function, w = P (a?j 7^ 0) is the prior proba- 
bility of a non-zero pixel and (7(0;^ |a) = ^ exp (— 1r* (xi) is a single-sided 
exponential distribution where is a set of positive real numbers and 1e( - ) 
denotes the indicator function on the set E: 

, 1, if x G E; 
Mx)={ (5) 
0, otherwise. 

A distinctive property of the image prior Q is that it can be expressed as a 
latent variable model 

p(xi\a, Zi) = (1 - Zi)S(Xi) + Zig(xi\a), (6) 

where the binary variables {zi} 1 ^ are independent and identically distributed 
and indicate if the pixel Xi is active 



1) if ^ ^ 0; 
: < (7) 
0, otherwise. 



and have the Bernoulli probabilities: z; t ~ Ber(w). 

The prior distribution of pixel value Xi in Q can be rewritten conditionally 
upon latent variable Zi 

p{x i \z l = 0) = S (x.^ , 
p(xi\a, z t = 1) = g(xi\a), 

which can be summarized in the following factorized form 

p(xi\a,Zi) = S{x i ) 1 ~ z '-g(x l \a) z \ (8) 

By assuming each component Xi to be conditionally independent given z\ and 
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a, the following conditional prior distribution is obtained for x: 

N 

p(x|a,z) = H[6(x i ) 1 -**g(x i \ay<] (9) 



where z = [z\, . . . , zjsr]- 

This factorized form will turn out to be crucial for simplifying the variational 
Bayes reconstruction algorithm in Section [3j 

2.3.3. PSF parameter prior 

We assume that the PSF parameters c\, . . . , ck are independent and c& is 
uniformly distributed over intervals 

S k = [-Ac k ,Ac k }. (10) 

These intervals are specified a priori and are associated with error tolerances 
of the imaging instrument. The joint prior distribution of c = [c 1; . . . , ck] T is 
therefore: 

^c)=n^-i 5fc (c fe ). (ii) 

k=l k 

2.3.4- Noise variance prior 

A conjugate inverse-Gamma distribution with parameters ?o and <;i is as- 
sumed as the prior distribution for the noise variance (See jAppendix A.l for 
the details of this distribution) : 

a 2 \s ^i ~ig{^^i)- (12) 



The parameters <^o and will be fixed to a number small enough to obtain a 
vague hyperprior, unless we have good prior knowledge. 

2. 4- Hyperparameter Priors 

As reported in [101113]; the values of the hyperparameters {a, w} greatly 
impact the quality of the deconvolution. Following the approach in [5], we 



9 



propose to include them within the Bayesian model, leading to a second level of 
hierarchy in the Bayesian paradigm. This hierarchical Bayesian model requires 
the definition of prior distributions for these hyperparameters, also referred to 
as hyperpriors which are defined below. 

2.4-1- Hyperparameter a 

A conjugate inverse-Gamma distribution is assumed for the Laplacian scale 
parameter a: 

a\a ~TQ (ao,ai) , (13) 

with a = [ao,ai] T . The parameters ao and ot\ will be fixed to a number small 
enough to obtain a vague hyperprior, unless we have good prior knowledge. 

2-4-2. Hyperparameter w 

We assume a Beta random variable with parameters (/3o,/?i), which are 
iteratively updated in accordance with data fidelity. The parameter values will 
reflect the degree of prior knowledge and we set /?o = 01 — 1 to obtain a non- 



informative prior. (See Appendix A. 2 for the details of this distribution) 



ti;~B(A),A)- (14) 

2.5. Posterior Distribution 

The conditional relationships between variables is illustrated in Fig. [I] The 
resulting posterior of hidden variables given the observation is 

p(x, a,z,w,c, cr 2 |y) cx p(y|x, c, a 2 ) 

x p(x\a, z)p(z\w)p(w)p(a)p(c)p(a 2 ) . (15) 

Since it is too complex to derive exact Bayesian estimators from this posterior, 
a variational approximation of this distribution is proposed in the next section. 
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Figure 1: Conditional relationships between variables. A node at an arrow tail conditions the 
node at the arrow head. 

3. Variational Approximation 

3.1. Basics of Variational Inference 

In this section, we show how to approximate the posterior densities within a 
variational Bayes framework. Denote by U the set of all hidden parameter vari- 
ables including the image variable x in the model, denoted by M.. The hierarchi- 
cal model implies the Markov representation p(y, U|.M) = p(y|U, Ai)p(XJ\A4). 
Our objective is to compute the posterior p(x.\y,M) = J p(y\U, M)p(U\M)dU\- x _/p(y\M) , 
where U\ x is a set of variables in U except x. Let q be any arbitrary distribution 
ofU. Then 

\np(y\M)=C(q)+KL(q\\p) (16) 

with 

W< (U 'M™> (17) 

KL(«||p) = - J q(U\M) In {^^f) dU. (18) 

We observe that maximizing the lower bound C(q) is equivalent to mini- 
mizing the Kullback-Leibler (KL) divergence KL(q||p). Consequently, instead 
of directly evaluating p(y\A4) given Ai, we will specify a distribution q(XJ\A4) 
that approximates the posterior p(U|y, M.). The best approximation maximizes 
C(q). We present Algorithm [l] that iteratively increases the value of C(q) by 
updating posterior surrogate densities. To obtain a tractable approximating 
distribution q, we will assume a factorized form as q(V) = Yij^i^j) where 
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U has been partitioned into disjoint groups TJj. Subject to this factorization 
constraint, the optimal distribution q* (U) = ■ g*(Uj) is given by 

In q* (Vj) = E\u. [lnp(U, y)] + (const), Vj (19) 

where E\u,. denotes the expectatiorj^jwith respect to all factors U, except i = j. 
We will call q* (U) the posterior surrogate for p. 

3.2. Suggested Factorization 

Based on our assumptions on the image and hidden parameters, the random 
vector is U = {9, <fi} = {x, a, z, it), c, a 2 } with 9 = {x, z, c} and 4> — {a, w, c 2 }- 
We propose the following factorized approximating distribution 

?(U) = q(x, a, z, w, c, a 2 ) = q(x, z, c)q(a, w, a 2 ). (20) 

Ignoring constantf]^] ( fl9] ) leads to 

lng(a, w, cr 2 ) = Ey a lnp(x|a, z)p(a) + 

s 

In g(a) 

E Xtu lnp(z|u-)?3(w) + E V 2 lnp(y|x, ct 2 )p(ct 2 ) (21) 

S V ' S V ' 

In g(t«) lnq(cr 2 ) 

which induces the factorization 

q{ct>) = q(a)q(w)q(a 2 ). (22) 

4 In the sequel, we use both E [•] and (■} to denote the expectation. To make our expressions 
more compact, we use subscripts to denote expectation with respect to the random variables 
in the subscripts. These notations with the subscripts of '\v' denote expectation with respect 
to all random variables except for the variable v. e.g. E^u^ 

5 In the sequel, constant terms with respect to the variables of interest can be omitted in 
equations. 
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Similarly, the factorized distribution for x, z and c is 



Q(0) 



q(z)q(c) 



(23) 



leading to the fully factorized distribution 



q(0,<j>) 



q{a)q(z)q(w)q(c)q(a 2 ) 



(24) 



3.3. Approximating Distribution q 

In this section, we specify the marginal distributions in the approximated 



posterior distribution required in ( 24 ) . More details are described in Appendix 



[B| The parameters for the posterior distributions are evaluated iteratively due 
to the mutual dependence of the parameters in the distributions for the hidden 
variables, as illustrated in Algorithm [T] 



3.3.1. Posterior surrogate for a 

q(a) =Z£(a ,ai), 
with a Q = a + J2( z i)^l = a l + J2( z i x i)- 

3.3.2. Posterior surrogate for w 



(25) 



q(w)=B0 Q Ji), 
with /3 = (3 + N - J2( Zi ) ,h = ft + J2( Zl ). 



(26) 



3.3.3. Posterior surrogate for a 2 



q(a 2 )=ig(to,Si), (27) 

with^o = + ft = (||y-Hx|| 2 )/2 + ft , and (||y-Hx|| 2 ) = || y -(H)(x)|| 2 + 
varfo] [||<k)|| 2 + J2i CT cJ«z|| 2 ] + Ez ^cJH'fc)!! 2 , where a Cl is the variance of 
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the Gaussian distribution <?(q) given in (33) and varfaxi] is computed under the 
distribution q{xi) defined in the next section and described in Appendix B.3 



3.3.4- Posterior surrogate for x 
We first note that 

In g(x, z) = In g(x|z)g(z) = E [lnp(y |x, a 2 )p(x\a, z)p(z|u>)] . (28) 

The conditional density of x given z is p(x\a,z) — Yif 9zii x i)> where go(xi) — 
S(xi), gi(xi) = g(xi\a). Therefore, the conditional posterior surrogate for Xi is 

q(x i \z i = 0) = 6(x i ), (29) 
q(xi\zi = 1) = <j) + (fii,r]i), (30) 

where (/>+(/i,cr 2 ) is a positively truncated-Gaussian density function with the 
hidden mean fj, and variance a 2 , r\i = l/[(|jhi|| 2 )(l/cr 2 )], \ii = r]i[(hf ei)(l/a 2 ) — 
(1/a)], = y — Hx_„ x_, ; is x except for the ith entry replaced with 0, and hj 
is the zth column of H. Therefore, 

q(xi) = q(zi = 0)S(xi) + q(zi = l)<f> + ((j,i,r)i), (31) 

which is a Bernoulli truncated-Gaussian density. 

3.3.5. Posterior surrogate for z 
For i=l,...,N, 

q{ Zl = 1) = 1/[1 + C[\ and q( Zl = 0) = 1 - q( Zi = 1), (32) 

with C[ = exp(C l /2 x ? /fi + Mi5o/«i +lnai - ^(<S ) + i>0o) ~ i>0i))- i> is 
the digamma function and C$ — (||hi|| 2 )(/i 2 + r/i) — 2(e[ hj}/^. 
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3.3.6. Posterior surrogate for c 
For 3=1,... ,K, 

q{cj) = (j>{n Cj ,a Cj ), 



(33) 



where </>(/i, <r) is the probability density function for the normal distribution with 

(x T IF T y - xHJ T H°x - , . x T H^ T H i Q x) 

the mean a and variance a, u r — ™ — , 

(x T H 3 H J x) 

and l/a Cj = (l/a 2 )(x. T W T W X ). 

Algorithm 1 VB semi-blind image reconstruction algorithm 

1: % Initialization: 

2: Initialize estimates (x^ ^), (z^ )), and w(°\ and set c = to have = kq, 



% Iterations: 

for t = 1,2, 



do 

~(t) z,M 



Evaluate a^' ,0^' in ( [25 ) by using (x^ ^ 
Evaluate jffi ffi in pi by using (z^ 1 ); 



Evaluate f^,^ in Q from (||y - Hx|| 2 ) 
for i = 1,2,..., AT do 



Evaluate necessary statistics (/ij,r?j) for g(^i|zj = 1) in (29), 



Evaluate q(zi = 1) in (32), 
Evaluate (a;,-), var[xj], 

For I = 1,.. . ,K, evaluate // C! , l/cr C( for <?(q) in (33), 
end for 
end for 



The final iterative algorithm is presented in Algorithm [T] where required 
shaping parameters under distributional assumptions and related statistics are 
iteratively updated. 



4. Simulation Results 

We first present numerical results obtained for Gaussian and typical MRFM 
PSFs, shown in Fig. [2] and Fig. |6j respectively. Then the proposed variational 
algorithm is applied to a tobacco virus MRFM data set. There are many pos- 
sible approaches to selecting hyperparameters, including the non-informative 
approach of [9] and the expectation- maximization approach of [12J. In our ex- 
periments, hyper-parameters q , <r 1; a , and a.\ for the densities are chosen based 
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on the framework advocated in [5] . This leads to the vague priors corresponding 
to selecting small values ft) = ?i = c^o = o-i — 1- For w, the noninformative 
initialization is made by setting /3q = Pi = 1, which gives flexibility to the sur- 
rogate posterior density for w. The resulting prior Beta distribution for to is a 
uniform distribution on [0, 1] for the mean proportion of non-zero pixels. 

«;~B(A),/9i)~W([0,l]). (34) 

The initial image used to initialize the algorithm is obtained from one Landwe- 
ber iteration [44] . 

4--1- Simulation with Gaussian PSF 

The true image x used to generate the data, observation y, the true PSF, and 
the initial, mismatched PSF are shown in Fig. [2j Some quantities of interest, 
computed from the outputs of the variational algorithm are depicted as functions 
of the iteration number in Fig. [3j These plots indicate that convergence to the 
steady state is achieved after few iterations. In Fig. [3j E [w] and E[l/a] get 
close to the true level but E [l/c 2 ] shows a deviation from the true values. 
This large deviation implies that our estimation of noise level is conservative; 
the estimated noise level is larger than the true level. This relates to the large 
deviation in projection error from noise level (Fig. 3(a)[ ). The drastic changes in 



the initial steps seen in the curves of E [1/a] , E [w] are due to the imperfect prior 
knowledge (initialization). The final estimated PSF and reconstructed image 
are depicted in Fig. |4j along with the reconstructed variances and posterior 
probability of zi ^ 0. We decomposed the support region of the true PSF to 
produce orthonormal bases shown in Fig. [5] We extracted 4 bases because 

these four PSF bases clearly explain the significant part of the true Gaussian 
PSF. In other words, little energy resides outside of this basis set in PSF space. 

The reconstructed PSF clearly matches the true one, as seen in Fig. [2] and 
Fig. [3] Note that the restored image is slightly attenuated while the restored 
PSF is amplified because of intrinsic scale ambiguity. 
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Figure 2: Experiment with Gaussian PSF: true image, observation, true PSF, and mismatched 
PSF (#so). 
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line) 

Figure 3: Result of Algorithm [TJ curves of residual, error, E [1/a] , E [l/cr 2 ] ,E [w] ,E [c], as 
functions of number of iterations. These curves show how fast the convergence is achieved. 



4.2. Simulation with MRFM type PSFs 

The true image x used to generate the data, observation y, the true PSF, 
and the initial, mismatched PSF are shown in Fig. [6| The PSF models the PSF 
of the MRFM instrument, derived by Mamin et al. [$]. The convergence of 
the algorithm is achieved after the 10th iteration. The reconstructed image can 
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(a) Estimated PSF (b) Estimated image 




10 20 30 10 20 30 

(c) Variance map (d) Weight map 



Figure 4: (a) Restored PSF, (b) image, (c) map of pixel-wise (posterior) variance, and (d) 
weight map. ft = Eft is close to the true one. A pixel-wise weight shown in (d) is the posterior 
probability of the pixel being a nonzero signal. 



be compared to the true image in Fig. [7j where the pixel-wise variances and 
posterior probability of zi ^ are rendered. The PSF bases are obtained by 
the procedure proposed in Section |2.2| with the simplified MRFM PSF model 
and the nominal parameter values [10] . Specifically, by detecting a knee K = 4 
at the scree plot, explaining more than 98.69% of the observed perturbations 
(Fig. 3 in [S]), we use the first four eigenfunctions, corresponding to the first four 
largest eigenvalues. The resulting K — 4 principal basis vectors are depicted 
in Fig. [8] The reconstructed PSF with the bases clearly matches the true one, 
as seen m Fig. [6] and Fig. [7] 

4-3. Comparison with PSF -mismatched reconstruction 

The results from the variational deconvolution algorithm with a mismatched 
Gaussian PSF and a MRFM type PSF are presented in Fig. [9] and Fig. 10 re- 



spectively; the relevant PSFs and observations are presented in Fig.[2]in Section 
4.1| and in Fig. [6] in Section |4.2[ respectively. Compared with the results of our 
VB semi-blind algorithm (Algorithm [lj , shown in Fig. [1] and Fig. [7j the recon- 
structed images from the mismatched non-blind VB algorithm in Fig. [9] and 



18 



2 4 6 8 10 

(a) The first basis Ki 



2 4 6 8 10 

(b) The second basis K2 




0.3 


2 




4 


0.2 


G 


0.1 


8 


10 



2 4 6 8 10 
(c) The third basis K3 




2 4 6 8 10 

(d) The fourth basis H4 



Figure 5: PSF bases, Ki, . . . , K4, for Gaussian PSF. 

Fig. [lOj respectively, inaccurately estimate signal locations and blur most of the 
non-zero values. 

Additional experiments (not shown here) establish that the PSF estimator 
is very accurate when the algorithm is initialized with the true image. 



4- 4- Comparison with other algorithms 

To quantify the comparison, we performed experiments with the same set 
of four sparse images and the MRFM type PSFs as used in [9] . By generating 
100 different noise realizations for 100 independent trials with each true image, 
we measured errors according to various criteria. We tested four sparse images 
with sparsity levels ||x||o = 6, 11, 18,30. 

Under these criteria^) Fig. 11 visualizes the reconstruction error performance 
for several measures of error. From these figures we conclude that the VB semi- 
blind algorithm performs at least as well as the previous MCMC semi-blind al- 



6 Note that the Iq norm has been normalized. The true image has value 1; ||x||o/||x||o is 
used for MCMC method; E [w] X jV/||x||n for variational method since this method does not 
produce zero pixels but E [w] . 

Note also that, for our simulated data, the (normalized) true noise levels are ||n|| 2 /||x||o = 
0.1475, 0.2975, 0.2831, 0.3062 for ||x|| = 6, 11, 18, 30, respectively. 
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(d) Mismatched PSF 

Figure 6: Experiment with simplified MRFM PSF: true image, observation, true PSF, and 
mismatched PSF (reo)- 

gorithm. In addition, the VB method outperforms AM [45 and the mismatched 
non-blind MCMC [33] methods. In terms of PSF estimation, for very sparse 
images the VB semi-blind method seems to outperform the MCMC method. 
Also, the proposed VB semi-blind method converges more quickly and requires 
fewer iterations. For example, the VB semi-blind algorithm converges in ap- 
proximately 9.6 seconds after 12 iterations, but the previous MCMC algorithm 
takes more than 19.2 seconds after 40 iterations to achieve convergenc^j 

In addition, we made comparisons between our sparse image reconstruction 
method and other state-of-the-art blind deconvolution methods [S7J 133 12^1 
I31ll32j . as shown in our previous work [§]. These algorithms were initialized with 
the nominal, mismatched PSF and were applied to the same sparse image as our 
experiment above. For a fair comparison, we made a sparse prior modification 
in the image model of other algorithms, as needed. Most of these methods do 
not assume or fit into the sparse model in our experiments, thus leading to poor 
performance in terms of image and PSF estimation errors. Among these tested 

7 The convergence here is defined as the state where the change in estimation curves over 
time is negligible. 
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Figure 7: Restored PSF and image with pixel-wise variance and weight map. n = Ek is close 
to the true one. 



algorithms, two of them, proposed by Tzikas et al. [28] and Almeida et al. [30] , 
produced non-trivial and convergent solutions and the corresponding results are 



compared to ours in Fig. 11 By using basis kernels the method proposed by 
Tzikas et al. [28] uses a similar PSF model to ours. Because a sparse image 
prior is not assumed in their algorithm [28], we applied their suggested PSF 
model along with our sparse image prior for a fair comparison. The method 
proposed by Almeida et al. [30 exploits the sharp edge property in natural 
images and uses initial, high regularization for effective PSF estimation. Both 
of these perform worse than our VB method as seen in Fig. |11[ The remaining 
algorithms 27, 29, 3TJGI2], which focus on photo image reconstruction or motion 
blur, either produce a trivial solution (x w y) or are a special case of Tzikas's 
model [25] . 

To show lower bound our myopic reconstruction algorithm, we used the Iter- 
ative Shrinkage/Thresholding (1ST) algorithm with a true PSF. This algorithm 
effectively restores sparse images with a sparsity constraint [46]. We demon- 
strate comparisons of the computation time[^] of our proposed reconstruction 



'Matlab is used under Windows 7 Enterprise and HP-Z200 (Quad 2.66 GHz) platform. 
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Figure 8: PSF bases, rei, 



,#s 4 , for MRFM PSF. 



algorithm to that of others in Table [T] 

Table 1: Computation time of algorithms (in seconds), for the data in Fig.[& 



Our method 


9.58 


semi-blind MC 


19.20 


Bayesian nonblind 43 


3.61 


AM gS] 


0.40 


Almeida's method [30] 


5.63 


Amizic's method [2"9"] 


5.69 


Tzikas's method [15] 


20.31 


(oracle) 1ST gS] 


0.09 



4.5. Application to tobacco mosaic virus (TMV) data 

We applied the proposed variational semi-blind sparse deconvolution algo- 
rithm to the tobacco mosaic virus data, made available by our IBM collaborators 



[38] . shown in the first row in Fig. 12 Our algorithm is easily modifiable to these 
3D raw image data and 3D PSF with an additional dimension in dealing with 
basis functions to evaluate each voxel value Xi. The noise is assumed Gaussian 
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Figure 9: (mismatched) Non-blind result with a mismatched Gaussian PSF. 



[36J [38] and the four PSF bases are obtained by the procedure proposed in 2.2 
with the physical MRFM PSF model and the nominal parameter values [3] . The 



reconstruction of the 6th layer is shown in Fig. 12(b) and is consistent with the 
results obtained by other methods, (see 0H3].) The estimated deviation in 
PSF is small, as predicted in [9]. 

While they now exhibit similar smoothness, the VB and MCMC images are 
still somewhat different since each algorithm follows different iterative trajectory 
in the high dimensional space of 3D images, thus converging possibly to slightly 
different stopping points near the maximum of the surrogate distribution. We 
conclude that the two images from VB and MCMC are comparable in that both 
represent the 2D SEM image well, but VB is significantly faster. 

4-6. Discussion 

In blind deconvolution, joint identifiability is a common issue. For example, 
because of scale ambiguity, the unicity cannot be guaranteed in a general set- 
ting. It is not proven in our solution either. However, the shift /time ambiguity 
issue noticed in [47] is implicitly addressed in our method using a nominal and 
basis PSFs. Moreover, our constraint on the PSF space using a basis approach 
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Figure 10: (mismatched) Non-blind result with a mismatched MRFM type PSF. 



effectively excludes a delta function as a PSF solution, thus avoiding the trivial 
solution. Secondly, the PSF solution is restricted to this linear spanning space, 
starting form the initial, mismatched PSF. We can, therefore, reasonably expect 
that the solution provided by the algorithm is close to the true PSF, away from 
the trivial solution or the initial PSF. 

To resolve scale ambiguity in a MCMC Bayesian framework, stochastic samplers 
are proposed in |47j by imposing a fixed variance on a certain distributiorj^] An- 
other approach to resolve the scale ambiguity is to assume a hidden scale variable 
that is multiplied to the PSF and dividing the image (or vice versa.), where the 
scale is drawn along each iteration of the Gibbs sampler [4"8] , 



5. Conclusion 

We suggested a novel variational solution to a semi-blind sparse deconvolu- 
tion problem. Our method uses Bayesian inference for image and PSF restora- 
tion with a sparsity-inducing image prior via the variational Bayes approxima- 
tive note that this MCMC method designed for ID signal deconvolution is not efficient for 
analyzing 2D and 3D images, since the grouped and marginalized samplers are usually slow 
to converge requiring hundreds of iterations [47] ■ 
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Figure 11: For various image sparsity levels (x-axis: log 10 ||x||o), performance of several blind, 
semi-blind, and nonblind deconvolution algorithms: the proposed method (red), AM (blue), 
Almeida's method (green), Tzikas's method (cyan), semi-blind MC (black), mismatched non- 
blind MC (magenta). Errors are illustrated with standard deviations, (a): Estimated sparsity. 
Normalized true level is 1 (black circles), (b): Normalized error in reconstructed image. For 
the lower bound, information about the true PSF is only available to the oracle 1ST (black 
circles), (c): Residual (projection) error. The noise level appears in black circles, (d): PSF 
recovery error, as a performance gauge of our semi-blind method. At the initial stage of the 
algorithm, |[ ii^ii — ^fHi = 0-5627. (Some of the sparsity measure and residual errors are 
too large to be plotted together with results from other algorithms.) 



tion. Its power in automatically producing all required parameter values from 
the data merits further attention for the extraction of image properties and 
retrieval of necessary features. 

From the simulation results, we conclude that the performance of the VB 
method competes with MCMC methods in sparse image estimation, while re- 
quiring fewer computations. Compared to a non-blind algorithm whose mis- 
matched PSF leads to imprecise and blurred signal locations in the restored 
image, the VB semi-blind algorithm correctly produces sparse image estimates. 
The benefits of this solution compared to the previous solution [9j are faster 
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Figure 12: (a) TMV raw data, (b) estimated virus image by VB, (c) estimated virus image 
by MCMC 9_, and (d) virus image from electron microscope |38| . 



convergence and stability of the method. 



Appendix A. Useful Distributions 

Appendix A.l. Inverse Gamma Distribution 



b a 



The density of an inverse Gamma random variable X ~ XQ (a, b) is p7~y a: ° 1 ex P( — 
forze (0,oo). EX- 1 =a/&andEln(X) = ln(»-V(a)- 

Appendix A. 2. Beta Distribution 

r f ftir (b^ 

The density of a Beta random variable X ~ B(a,b) is ^ +6V x 0- ~ x ) a ^ 1 ^ f° r 

x G (0, 1), with r(c) = J °° t c-1 e _t dt. The mean of S(a, 6) is ^ and Eln(B(a, 6)) = 
-0(6) — ij)(a + b), where tp is a digamma function. 

Appendix A. 3. Positively Truncated Gaussian Distribution 

The density of a truncated Gaussian random variable x% is denoted by Xi ~ 
N+{xi\ H,n), and its statistics used in the paper are 

Efclzi >0] =E[JV+(a:<;At,»7)] 



E [xf |a?i > 0] = var^x* > 0] + (E [x^x, > 0]) 2 
= ?7 + ^(E [Kil^i > 0]), 



where $n is a cumulative distribution function for the standard normal distribution. 



2G 



Appendix B. Derivations of q(-) 

In this section, we derive the posterior densities defined by variational Bayes frame- 
work in Section [3] 

Appendix B.l. Derivation of q(c) 

We denote the expected value of the squared residual term by R = E||y — Hx|| 2 . 
For ci,l = l,...,K, 

R =E||y - H°x - U ' xc i ~ hJxc j II 2 

=c 2 (x T H jT H J x) - 2 Cj (x T H 3T y - xH jT H°x 
— x T H jT H'cix) + const, 

where H J is the convolution matrix corresponding to the convolution with Kj . For i 7^ j 
and i,j > 0, E(H l x) T (IFx) = tr{W T W : (cov(x) + (x)(x T ))) = (H l (x)) T (H J (x)), 
since tr(H iT H J cov(x)) = tr(WD T WD) = J2 k d l h iK = 0. Here, cov(x) is approxi- 
mated as a diagonal matrix D 2 = diag(d 2 , . . . , d 2 ). This is reasonable, especially when 
the expected recovered signal x exhibits high sparsity. Likewise, E(H°x) T (H : 'x) = 
KjK J E 1 var[x l ] + (H°(x)) T (H J '(x))andE(H J x) T (H-'x) = || Ki || 2 vax[xi]+\\W (x) || 2 . 

Then, we factonze E [—5^2 J = > wlth = — (x^h^wx) ' 

1/<7 C . = <l/a 2 >(x T IP T IFx). 

If we set the prior, p(cj), to be a uniform distribution over a wide range of the 
real line that covers error tolerances, we obtain a normally distributed variational 
density q(cj) = 4>{p, Cj with its mean fi Cj and variance u Cj defined above, because 
\nq(cj) — E [—0^2*] • By the independence assumption, q(c) = J~[ q{cj), so q(c) can be 
easily evaluated. 

Appendix B.2. Derivation of q(o~ 2 ) 

We evaluate R ignoring edge effects; R = ||y - <H)<x)|| 2 + £ var[xi][|| <k) || 2 + 
a Cl \\k,i\\ 2 ] + a Cl ||H ! (x)|| 2 . ||k|| 2 is a kernel energy in £2 sense and the variance 
terms add uncertainty, due to the uncertainty in k, to the estimation of density. 



Applying (19 1, (ignoring constants) 



lng(<7 2 ) = E V 2 [lnp(yjx,c,a 2 )p(CT 2 )p(x|a,w)p(™)p(a)] 

= E x , c [lnp(yjx, a 2 )] + lnp(<7 2 ) 

Ex.c [||y-Hx|| 2 ] p 
= ^ ^ W +lnp(a ). 

WQso, a) = q{° 2 ) = XG(P/2 + (||y - Hx|| 2 )/2 + ft). 
(E\ CT 2 denotes expectation with respect to all variables except a 2 .) 
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Appendix B.3. Derivation of q(x) 

For Xi, i = 1, . . . , N, R = E||ej — hiar-i |] 2 with = y — Hx_i = y — H°x ^ — 
^H'cjx-i, h 8 = [H° +^H ! Ci]i = h? + Eh-a = (ith column of H). Ignoring 
constants, R = (||hi|| 2 ):r 2 — 2(hJ ei)Xi. 

Using the orthogonality of the kernel bases and uncorrelatedness of cj's, we derive 
the following terms (necessary to evaluate R): (||hi|| 2 ) = ||h;|| 2 + E; °cj llh'ill 2 and, 
(hf ei ) = (hf)(y- (HXx-i)) - E, var[ Ci ]hf H'(x_i>. 

Then, var[aji] = w-E [x1\xi > 0] - w'i 2 (E[x l \x i > 0]) 2 , E [x^ = w^E[xi\xi > 0], 
where «4 = q{zt = 1) is the posterior weight for the normal distribution and 1 — w[ is 
the weight for the delta function. The required statistics of Xi that are used to derive 
the distribution above are obtained by applying | Appendix A . 3| 

Appendix B.4- Derivation of q(z) 

To derive q(zi = 1) = (zt), we evaluate the unnormalized version q(zi) of q(zi) and 
normalize it. lng(zj = 1) = E\ z . [— H 6 '"^'!! lna— S.+lniul within ~ N+{^Li,r)i) 

and lng(2:i = 0) = E\ z . ^— — h ln(l — with Xi = 0. The normalized version of 
the weight is q(z, = 1) = 1/[1 + C-]. C- = exp(lng(z t = 0)-\nq(Zi = 1)) = exp(C l /2x 
(l/cr 2 )+/i(l/a) + (lna) + (ln(l-™)-lnw) = exp(Ci/2x<;o/?i+Mao/ai+lnai-i/)(Qo) + 
tp{Po) — tp(Pi)). ip is a digamma function and C; — (||hi|| 2 )(/i 2 + ^) — 2{efhj)/ii. 
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